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Chronic GVHD was recognized as a complication of allogeneic hematopoietic cell 
transplantation more than 30 years ago, but progress has been slowed by the limited 
insight into the pathogenesis of the disease and the mechanisms that lead to development 
of immunological tolerance. Only 6 randomized phase III treatment studies have been 
reported. Results of retrospective studies and prospective phase II clinical trials suggested 
overall benefit from treatment with mycophenolate mofetil or thalidomide, but these 
results were not substantiated by phase III studies of initial systemic treatment for chronic 
GVHD. A comprehensive review of published reports showed numerous deficiencies in 
studies of secondary treatment for chronic GVHD. Fewer than 10% of reports 
documented an effort to minimize patient selection bias, used a consistent treatment 
regimen, or tested a formal statistical hypothesis that was based on a contemporaneous 
or historical benchmark. In order to enable valid comparison of the results from different 
studies, eligibility criteria, definitions of individual organ and overall response, and time 
of assessment should be standardized. Improved treatments are more likely to emerge 
if reviewers and journal editors hold authors to higher standards in evaluating manuscripts 
for publication. 
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INTRODUCTION - THE PAST 



Allogeneic hematopoietic cell transplantation (HCT) is 
frequently complicated by acute and chronic graft-versus- 
host disease (GVHD) [1, 2], Although considerable progress 
has been made in the development of methods to prevent 
or treat acute GVHD, similar progress in chronic GVHD 
has languished by comparison after the clinical and 
pathologic features of this syndrome were first described 
in 1980 [3]. Interest in this debilitating complication of HCT 
was rejuvenated when recommendations of the National 
Institutes of Health Consensus Conference on Criteria for 
Clinical Trials in Chronic Graft-versus-host disease were 
published in 2005-2006 [4-9]. 

The Consensus Conference recognized two major cate- 
gories of GVHD, each with 2 subcategories [4]. Acute GVHD 
with onset before day 100 was defined as "classic GVHD." 
A separate category was recognized for persistent, recurrent 
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or late-onset acute GVHD beyond day 100 after HCT. 
Chronic GVHD was separated from acute GVHD not by 
time from HCT but by the presence of diagnostic criteria 
or by distinctive findings supported by biopsy or other 
procedures. Classic chronic GVHD was defined by unam- 
biguous chronic GVHD manifestations in the absence of 
abnormalities such as cutaneous erythema, liver function 
abnormalities, or gastrointestinal manifestations typical of 
acute GVHD. "Overlap syndrome" is a subcategory of chronic 
GVHD characterized by chronic GVHD in the presence of 
one or more manifestations typical of acute GVHD. 

Chronic GVHD is a pleomorphic syndrome with "auto- 
immune" features that sometimes resemble clinical findings 
in scleroderma and Sjogren syndrome. The onset usually 
occurs between 3 and 15 months after HCT [10-13]. Risk 
factors associated with an increased risk of chronic GVHD 
include the use of a mobilized blood cell graft or an 
HLA-mismatched or unrelated donor, older patient age and 
a history of acute GVHD [12]. The risk of chronic GVHD 
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can be decreased by exhaustive depletion of T cells from 
the graft or by treatment of the recipient with rabbit 
antibodies specific for human T ceUs as part of the con- 
ditioning regimen before HCT [12, 14, 15]. Without these 
measures, approximately 30% to 50% of HCT recipients 
develop chronic GVHD [2, 16]. 

Chronic GVHD can affect multiple organs and sites, in- 
cluding the skin and subcutaneous connective tissues, 
lacrimal and salivary glands, oral mucosa, lungs, esophagus, 
joints, gastrointestinal tract and liver. The disease is char- 
acterized by immune dysfunction with an increased risk 
of infections and a 30% to 50% risk of mortality during 
the first 5 years after diagnosis [10, 11]. Chronic GVHD 
has been associated with a reduced risk of recurrent 
malignancy after HCT [17-22], but despite this benefit, 
survival is not improved [22]. A prognostic scoring system 
has recently been proposed based on factors present at the 
time of chronic GVHD diagnosis [13]. 



PRINCIPLES OF TREATMENT - THE PRESENT 



To date, only 6 randomized phase III studies have ever 
been reported for initial treatment of chronic GVHD [23-28]. 
The study by Koc et al. [26] was the only one that indicated 
benefit. Results of this study suggested that treatment with 
cyclosporine reduced the amount of glucocorticoid treatment 
needed to control the disease, as indicated by a decreased 
frequency of avascular necrosis. The generally recommended 
approach for treatment of chronic GVHD involves continued 
administration of the calcineurin inhibitor used for GVHD 
prophylaxis together with prednisone initially at 1 mg/kg/day 
[2, 29, 30]. Strategies for tapering the dose of prednisone 
vary considerably, but as a general principle, efforts should 
be made to use the minimum dose that is sufficient to control 
GVHD manifestations. At our center, the dose of prednisone 
is tapered to an alternate-day schedule of administration 
after initial clinical improvement, which generally occurs 
within 6 weeks after starting treatment. The dose of prednisone 
is then tapered to 0.5 mg/kg every other day and generally 
continued until reversible manifestations of the disease resolve. 
The dose of prednisone is then gradually tapered with careful 
monitoring for recurrent manifestations of chronic GVHD. 
Doses of the calcineurin inhibitor are gradually decreased 
after treatment with prednisone has been withdrawn. 

The median duration of treatment for chronic GVHD is 
approximately 2 years in patients who had HCT with marrow 
ceUs and approximately 3.5 years in those who had HCT 
with growth factor-mobilized blood cells [31]. The current 
therapeutic approach functions primarily to prevent immune- 
mediated damage, while awaiting the development of toler- 
ance. Evidence to suggest that current treatments accelerate 
the development of immunological tolerance is lacking. The 
mechanisms that facilitate development of tolerance have 
not been defined. 

Administration of medications to prevent infection with 
Pneumocystis jirovecii and encapsulated bacteria is necessary 



during treatment for chronic GVHD [32]. Some patients 
may need topical or systemic treatment to prevent mucocu- 
taneous Candida infection. Patients at risk of Varicella zoster 
activation should be given antiviral prophylaxis, and CMV 
monitoring and preemptive treatment is necessary in patients 
at risk of CMV infection [33]. Activation of CMV during 
the first 3 months after HCT suggests an increased risk of 
subsequent reactivation in patients with chronic GVHD. 
Systemic immunosuppressive treatment should be admin- 
istered at the lowest effective dose in order to minimize 
the risk of infections and other complications. Many steroid- 
related complications can be avoided or at least minimized 
by an alternate-day schedule of administration [34], and 
topical treatment can be used to minimize the need for 
systemic treatment [35]. Bone mineral density should be 
monitored yearly, and losses should be minimized through 
weight bearing exercise, administration of calcium and 
vitamin D supplements and hormone replacement. 

Indications for secondary treatment include worsening 
manifestations in a previously affected organ, development 
of manifestations in a previously unaffected organ, absence 
of improvement after 1 month of treatment, or inability 
to decrease the dose of prednisone below 1.0 mg/kg/day 
within 2 months [30, 36]. Numerous clinical trials have been 
carried out to evaluate approaches for secondary treatment 
of chronic GVHD. To date, no consensus has been reached 
regarding the optimal choice of agents for secondary 
treatment, and clinical management is generally approached 
through empirical trial and error [36]. Treatment choices 
are based on physician experience, ease of use, need for 
monitoring, risk of toxicity and potential exacerbation of 
pre-existing co-morbidity. 



MYCOPHENOLATE MOFETIL: A CASE STUDY 
ILLUSTRATING CURRENT PROBLEMS 



Progress in the clinical management of chronic GVHD 
has been slowed by limited insight into the pathogenesis 
of the disease and the mechanisms that lead to development 
of immunological tolerance. In the absence of pathophy- 
siologic understanding, physicians must rely on personal or 
published empirical experience in making decisions regard- 
ing treatment. In principle, results of treatment in patients 
with "steroid-refractory" or "steroid-resistant" chronic GVHD 
could be used to identify promising agents for initial treat- 
ment. Effective agents would be expected to decrease reliance 
on glucocorticoids and could conceivably decrease the dura- 
tion of time needed for resolution of the disease. 

A variety of retrospective and phase II studies suggested 
that MMF could be used successfully for secondary treatment 
of chronic GVHD. In results of a survey published in 2002, 
nearly 80% of clinicians reported that they had used myco- 
phenolate mofetil (MMF) with great success or at least some 
success for treatment of chronic GVHD [37]. In another survey 
proposing a hypothetical scenario describing a case of high-risk 
chronic GVHD, 54% of the respondents indicated that they 
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would add MMF for initial treatment of chronic GVHD [38]. 
These results supported a formal test of the hypothesis that 
the addition of MMF to standard initial treatment could 
improve outcomes for patients with chronic GVHD. 

We therefore conducted a prospective, double-blind, ran- 
domized phase III clinical trial to test this hypothesis [27]. 
The primary endpoint was resolution of reversible mani- 
festations of chronic GVHD within 2 years after enrollment, 
before death or the onset of recurrent malignancy and 
without the need for secondary systemic treatment. It was 
expected that the use of MMF would shorten the time to 
response, decrease systemic steroid exposure, and decrease 
the risk of transplant-related mortality without increasing 
the risk of recurrent malignancy, thereby potentially im- 
proving overall survival. Results of the trial, however, did 
not show any benefits of treatment with MMF. Potential 
reasons for the negative results were thoroughly explored. 
The absence of success in this randomized trial could not 
be attributed to an imbalance of risk factors between the 
arms, sub-optimal dosing of MMF or non-adherence with 
administration of the study drug. Hence, this clinical trial 
definitively demonstrated that addition of MMF to standard 
initial treatment did not improve outcomes for patients with 
chronic GVHD. 

These unexpected results conflicted with previously pre- 
vailing clinical impressions and motivated a careful review 
of prior reports evaluating the use of MMF for treatment 
of chronic GVHD. Overall results of 9 such studies suggested 
that secondary treatment with MMF produced a 20% 
complete response rate and a 65% complete or partial 
response rate (Table 1) [39-47]. One of these studies also 
evaluated results in 10 patients who received MMF as part 
of the initial treatment regimen for chronic GVHD [45]. 
Seven of the 10 patients had a complete response, and 2 
had a partial response, yielding an overall 90% rate of 
complete or partial response. In addition, a further study 
from our center had shown that the proportion of patients 
who discontinued systemic immunosuppressive treatment 
after resolution of reversible abnormalities increased pro- 
gressively from 9% to 17% and 26% at 1, 2 and 3 years, 
respectively, after starting treatment with MMF [48]. 

Similar discrepancies have been observed in studies to 



Table 1 . Response rates in prior studies of mycophenolate mofetil. 



Study 


Type 


N 


CR 


PR 


CR+PR (%) 


1 


Retrospective 


26 


2 


10 


12 (46) 


2 


Prospective 


5 


2 


3 


5 (100) 


3 


Prospective 


21 


5 


8 


13 (62) 


4 


Prospective 


15 


2 


7 


9 (60) 


5 


Retrospective 


11 


3 


5 


8(73) 


6 


Retrospective 


13 


1 


9 


10 (77) 


7 


Retrospective 


13 


4 


5 


9 (69) 


8 


Retrospective 


24 


5 


13 


18 (75) 


9 


Prospective 


11 


4 


3 


7(64) 




Total 


139 


28 (20%) 


63 


91 (65) 



evaluate the efficacy of thalidomide for treatment of chronic 
GVHD. Results of 6 retrospective studies and prospective 
phase II clinical trials suggested favorable outcomes with 
the use of thalidomide for secondary treatment of chronic 
GVHD [49-54]. The two randomized prospective studies 
testing the use of thalidomide for primary treatment of 
chronic GVHD, however, showed no benefit [24, 25]. 

Results of the randomized trials defied expectations 
coming from at least 16 studies evaluating the use of MMF 
or thalidomide for treatment of chronic GVHD. At least 

2 explanations could be proposed to explain this discrepancy. 
1) Results of secondary treatment might not predict efficacy 
as an added agent for primary treatment, perhaps because 
most patients do not need additional agents in order to gain 
maximal benefit from initial treatment. Experience at our 
center, however, has indicated that systemic treatment is 
changed in approximately 60% of patients during the first 

3 years because of inadequate response to primary therapy 
for chronic GVHD [55]. 2) Alternatively, previous studies 
might have had unrecognized limitations leading to over- 
stated expectations. 



PROPOSED QUALITY INDICATORS FOR 
EVALUATING REPORTS 



Previous publications have identified quality indicators 
for evaluation of phase III clinical trials [56, 57], but to 
our knowledge, similar quality criteria have not been pre- 
viously proposed for phase II trials and retrospective studies. 
Therefore, before embarking on a detailed review of the 
10 previous studies evaluating the use of MMF, we developed 
a list of 10 quality indicators that could be used to characterize 
an ideal prospective phase II clinical trial or retrospective 
study of treatment for GVHD. The proposed quality indi- 
cators are summarized below. 

1. Adequately defined eligibility criteria 

Inclusion and exclusion criteria should specify affected 
sites, severity of manifestations, and prior treatment used 
to define the cohort. Exclusion criteria should indicate 
whether factors such as the presence of infection, inability 
to tolerate the study treatment, presence of persistent malig- 
nancy or low performance score were used to define the 
cohort. Studies intended to evaluate treatment of "steroid- 
refractory" GVHD should indicate the glucocorticoid dose 
and duration of treatment used to define the cohort. Eligi- 
bility criteria are typically more precisely defined for pro- 
spective studies than for retrospective studies. Data from 
retrospective studies describing all patients who received 
the study treatment of interest are difficult to interpret unless 
additional selection criteria are applied to improve homo- 
geneity within the study cohort. 

2. Documented minimization of bias in the selection of pa- 
tients 

Readers should be given enough information to determine 
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whether the characteristics of the patients included in a 
study are representative of the more general population of 
patients with chronic GVHD. Risk factors that could affect 
outcome should be delineated. Ideally, either an historical 
or contemporaneous cohort should be identified for com- 
parison, and any differences in the prevalence of risk factors 
between the study cohort and the comparison cohort should 
be noted. The use of randomization to define cohorts helps 
to ensure the absence of bias, but this procedure does not 
ensure that the study cohort is representative of the more 
general population of patients with the indication of interest. 
Enrollment of all consecutive patients who meet eligibility 
criteria can ensure that the cohort is representative of the 
more general population, but this approach would raise 
concerns about the adequacy of informed consent. Thus, 
comparisons of demographics and risk factors between 
patients who participated and those who did not are crucial. 

3. Consistent treatment regimen 

The study treatment of interest should be administered 
in a consistent manner in dose, schedule and duration of 
administration. Differences in dose, schedule or duration 
of administration can be addressed by stratified analysis of 
each specific subgroup. As much as possible, concomitant 
treatment with immunosuppressive agents other than gluco- 
corticoids should also be administered in a consistent manner 
in order to facilitate the interpretation of results. Such con- 
sistency greatly improves the ability to interpret results and 
to confirm the results in subsequent studies. Concomitant 
treatment can be standardized more easily in studies of initial 
therapy for standard or high-risk disease and for secondary 
therapy than in studies of subsequent therapies. For third-line 
or subsequent therapy, such consistency is feasible only if 
prior treatment with agents other than glucocorticoids is 
discontinued. 

4. Objective criteria for organ response 

Categorical criteria should be defined for complete 
response, partial response, no change, and worsening for 
each organ or site affected by GVHD, even if organ response 
criteria have not been validated, since conclusions of the 
study are based on response rates. Definitions require formal 
assessment at baseline and at the comparison time point. 
In many studies, partial response was defined as "at least 
50% improvement" in disease manifestations. This terse, and 
likely oversimplified, definition meets the formal criterion 
of objectivity but suggests that the response assessment 
actually reflects a general overall impression, as opposed 
to a detailed comparison of changes in chronic GVHD 
manifestations in each organ between baseline and the 
assessment time. 

5. Unambiguous criteria for overall response 

The definition of overall response is distinct from the 
criteria for organ response. Overall responses are often 
defined according to the overall pattern of organ responses. 
At a minimum, overall partial response indicates improve- 



ment in at least one organ. The category assigned for patients 
with improvement in one organ but deterioration in another 
organ should be clearly stated. 

6. Specified time for assessment of response 

To facilitate comparisons between studies, at least one 
specified time point should be used for assessment of re- 
sponse, and the data for this assessment should be shown. 
Additional information can be shown as a time to event 
analysis. The number of patients who died or had recurrent 
malignancy before the assessment time point should be 
specified, and results should clearly indicate whether these 
patients were excluded from consideration in the assessment 
of response or whether they were included as non-responders. 
Tabulation of results according to "best response" or "last 
value carried forward" is not appropriate, since these categories 
do not reflect clinical benefit at a specific time point. 

7. Concomitant treatment taken into account 

New systemic treatment for GVHD added after enrollment 
but before the assessment time point because of inadequately 
controlled disease manifestations should be categorized as 
non-response. Even in studies that use "best response" as 
the endpoint, the text should state whether response was 
evaluated before any new systemic treatment was added. 
Changes in glucocorticoid dose should be described, but a 
temporary small increase in glucocorticoid dose during a 
taper should not be categorized as non-response, because 
temporary flares of GVHD activity cannot be avoided when 
conscientious efforts are made to determine the minimum 
glucocorticoid dose needed to control GVHD. 

8. Well-established control benchmark 

A specific historical or concurrent control benchmark 
should be used to establish a null hypothesis for the primary 
endpoint. Response criteria for the benchmark and study 
cohorts should be identical or closely similar. 

9. Statistical hypothesis and power estimate 

The methods should provide values for the null and alter- 
native hypotheses and for the one-sided or two-sided type 
1 error, together with estimates of statistical power and the 
necessary sample size. Although these considerations might 
be difficult to apply in retrospective studies, they should 
always be applied in prospective studies. 

10. Survival 

The results should show survival of the cohort from the 
onset of study treatment. Kaplan-Meier curves should show 
tic marks depicting end of follow-up, especially if the 
minimum follow-up time for surviving patients is less than 
6 months. Alternatively, results can be shown in tables 
indicating time to death or last follow-up for each patient. 
When response definitions differ, survival data provide the 
only gauge that can be used as a simple and universally 
applicable method for comparisons with other studies. 
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Table 2. Quality of prior reports 
mofetil. 


of studies testing mycophenolate 


Quality criterion 






Study number 




Total 


1 


2 


3 4 5 6 7 8 


9 10 


Eligibility criteria 




Y 


Y 


Y 


3 


(VI 1 1 11 1 T ll/.dUUI 1 Ul bLILLUUIl Uldb 










0 


Consistent treatment regimen 


Y 




Y Y 




3 


Objective response criteria 






Y Y 




2 


Overall response criteria 






Y 


Y 


2 


Time of assessment 








Y 


1 


Concomitant treatment 




Y 


Y Y Y Y 


Y Y 


7 


Historical benchmark 










0 


Statistical hypothesis 










0 


Survival curve 






Y 


Y 


2 


Total 


1 


2 


2 4 2 2 1 0 


2 4 







EVALUATION OF PRIOR REPORTS FOR 
STUDIES OF MMF 



Two individuals (YI and PM) independently reviewed 
the 10 prior reports of studies testing the use of MMF for 
secondary treatment of chronic GVHD [39-48]. Reports were 
evaluated according to whether each quality criterion was 
met or not, based on careful reading of the text. Differences 
in scores were reconciled by joint review to arrive at a 
consensus. Since the purpose of publication is to persuade 
others, application of the criteria was very strict, and no 
credit was given if the text did not address the criterion 
or if the text was not clear. Therefore, in many cases, defi- 
ciencies in the report might not have been representative 
of a study as it was actually conducted. 

Results for the 10 studies of MMF are summarized in 
Table 2. Scores at the bottom of the table represent the 
total number of criteria met by each report. One report 
failed to meet any of the 10 criteria. Two reports met 4 
criteria, and none had higher scores. The mean score for 
the 10 reported studies was 2.0. Scores at the right margin 
of the table represent the number of reports that met each 
criterion. None of the reports attempted to demonstrate that 
bias had been minimized in the selection of patients, used 
an historical or contemporaneous benchmark or tested a 
statistical hypothesis. Only one report had a specified time 
of assessment, and only two had objective response criteria 
and well-defined overall response criteria. Three reports 
employed a consistent treatment regimen, while 7 accounted 
for possible effects of concomitant treatment. 



COMPREHENSIVE REVIEW OF PRIOR REPORTS 



Results of the review of reports for studies testing MMF 
prompted a more comprehensive review of studies testing 
systemic agents for secondary treatment of chronic GVHD 



Table 3. Initial agreement between evaluators. 3 ' 



Quality criterion 


Agreement 


Eligibility criteria 


87% 


Minimization of selection bias 


97% 


Consistent treatment regimen 


92% 


Objective response criteria 


75% 


Overall response criteria 


78% 


Time of assessment 


88% 


Concomitant treatment 


72% 


Historical benchmark 


97% 


Statistical hypothesis 


98% 


Survival curve 


87% 



al Each of the 60 selected reports was independently evaluated by 2 
reviewers. Results in the table indicate the percent agreement 
between the 2 reviewers for each quality criterion. 



published between 1990 and 2011. We searched the Medline 
(PubMed) database using a broad search strategy to identify 
studies evaluating secondary treatment of chronic GVHD. 
The search was conducted using the terms "Chronic graft 
versus host disease" and "Treatment" excluding '"Review." 
Relevant references in the publications identified were also 
reviewed. Both retrospective and prospective studies were 
included, but studies with cohorts containing fewer than 
10 patients (N=26), phase III studies and case reports were 
excluded. A total of 60 studies were selected for review 
[39-54, 58-101]. Initial agreement between the two reviewers 
was high, ranging between 72% and 98% (Table 3). 

Across the 60 studies, 17 different agents were evaluated 
(Fig. 1). Extracorporeal photopheresis was the most fre- 
quently studied agent (N=17) followed by mycophenolate 
mofetil (N=10), thalidomide (N=6), sirolimus or everolimus 
(N=4) and rituximab (N=4). The distribution of scores re- 
presenting the total number of criteria met by each report 
ranged from a low of 0 (N=6) to 8 (N=l) [61] (Fig. 2). The 
mean score for all 60 reports was 2.5. The mean score for 
prospective studies (N=31) was 3.1, compared to 1.8 for 
retrospective studies (N=29). The mean score for multicenter 
studies (N=7) was 3.6, compared to 2.3 for single-center 
studies (N=53). 

Approximately 35% to 45% of all reports provided 
adequate information regarding eligibility criteria, organ 
response criteria, overall response criteria, concomitant 
treatment and overall survival (Table 4). Only 22% of the 
reports had a specified time for assessment of response, and 
less than 10% of the reports documented an absence of bias 
in the selection of patients, used a consistent treatment 
regimen, or tested a formal statistical hypothesis on the basis 
of a benchmark from a contemporaneous or historical cohort. 
The percentage of reports fulfilling quality indicators was 
generally higher for prospective studies than for retrospective 
studies (Table 4). 
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Extracorporeal photopheresis 
Mycophenolate mofetil 
Thalidomide 
Sirolimus or everolimus 
Rituximab 
Pentostatin 
Methotrexate 
Tacrolimus 
Imatinib 

Thoraco-abdominal irradiation 
Pulse steroid 
Mesenchymal stem cells 
Etretinate 
Etanercept 
Clofazimine 
Chloroquine 
Alefacept 



Table 4. Quality of prior reports. 31 



5 10 15 
Number of reports 



20 



Fig. 1. Treatments evaluated in prior reports. Treatments are listed in 
order of frequency among the 60 reports included in the literature 
review. 




Fig. 2. Distribution of scores representing the total number of criteria 
met by each report for the 60 studies included in the literature review. 



SHORTCOMINGS IN THE CURRENT STATE OF AFFAIRS 



Despite their many shortcomings, all 10 reports evaluating 
MMF offered favorable overall assessments, 8 in the abstract, 
and 2 in the discussion. All 10 reports called for additional 
studies, 3 in the abstract, and 7 in the discussion. The contrast 
with results of the prospective phase III trial testing MMF 
for initial treatment of chronic GVHD raises a general 
concern that other previously tested agents also do not 
provide as much benefit as suggested in the reports. The 
approach used in most reports relies on the assumption that 
any improvement after new treatment must have resulted 
from the new treatment, but most of the studies did not 
attempt to assess the durability of response. Taken as a whole, 
the collection of reports does not facilitate comparisons of 
efficacy from one agent to the next, and readers are left 





Percent of reports fulfilling criterion 


Quality criterion 


Total 


Retrospective 


Prospective 




(N = 60) 


(N = 29) 


(N = 31) 


Eligibility criteria 


38% 


17% 


58% 


Minimization of selection bias 


5% 


3% 


6% 


Consistent treatment regimen 


8% 


14% 


3% 


Objective response criteria 


45% 


24% 


65% 


Overall response criteria 


43% 


24% 


52% 


Time of assessment 


22% 


10% 


32% 


Concomitant treatment 


38% 


21% 


55% 


Historical benchmark 


2% 


0% 


3% 


Statistical hypothesis 


3% 


0% 


6% 


Survival curve 


38% 


45% 


32% 



■"Data in the table indicate the percentage of reports in each 
category that were judged to meet each of the indicated quality 
criteria. 



to conclude that everything works, more or less. 

Investigators prefer new treatment to be effective, and 
under the "publish or perish" pressures of academic life, 
authors may lose objectivity and attempt to portray results 
as positively as possible. None of the 60 reviewed results 
indicated negative overall results, strongly suggesting a 
powerful bias by authors and journals to publish only the 
results of "positive" studies. Conclusions from retrospective 
studies and phase II clinical trials should be stated more 
cautiously. For example, we suggest that an appropriate 
conclusion from the studies of MMF would be the following: 
"Our results demonstrate the feasibility of using MMF to 
treat chronic GVHD. The true merits of using MMF for 
this indication can be evaluated only in a prospective 
controlled trial". Small retrospective studies have very 
limited value for assessing results of a new treatment, and 
the distinction between retrospective studies and prospective 
studies is important. Nonetheless, many prospective phase 

II studies still fall far short of the ideal. 

Progress would be enhanced if studies could be conducted 
in a way that allows results to be compared from one study 
to the next in a more informative way. Aggregation of results 
for secondary therapy with those for third, fourth and 
subsequent lines of treatment makes such comparisons im- 
possible, due to large variation in prior treatment and 
concomitant therapy. Comparisons are also impeded by an 
inability to estimate the baseline prognosis of patients 
enrolled in any given study as compared to those enrolled 
in other studies. 

The current state of affairs has many harmful effects. Most 
reviews that summarize previous literature regarding treat- 
ment of chronic GVHD focus on overall complete and partial 
responses, leading readers to uncritical acceptance of con- 
clusions that agents are effective, when in fact, they are 
not. Agents that are accepted as effective could actually 
cause unrecognized harm, as suggested by results of the phase 

III MMF study [27]. Clinicians who believe that they already 
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know what is best have little incentive to participate in 
clinical trials. As a consequence, progress has stalled, and 
no one is able to identify new treatments that are truly 
effective. Progress would be enhanced if investigators could 
identify truly promising results from phase II trials and move 
forward more quickly to testing in definitive phase III trials. 



THE NEED FOR STANDARDIZATION 
- A PROPOSAL FOR THE FUTURE 



Progress would be greatly enhanced by standardization 
in 4 areas: eligibility criteria, organ response criteria, overall 
response criteria, and time of assessment. Eligibility criteria 
should focus on true secondary systemic therapy, rather than 
allowing enrollment at any point beyond primary therapy. 
This strategy will provide a more homogeneous population 
and more interpretable results than one that allows enroll- 
ment at any point in the development of the disease. Criteria 
defining "steroid-resistant" and "steroid-refractory" chronic 
GVHD should be standardized. Lack of adequate improve- 
ment after at least one month of treatment with prednisone 
at 1 mg/kg per day represents one possible definition, al- 
though some chronic GVHD manifestations would not be 
expected to improve within a month after starting treatment. 

Standardized response criteria have been proposed but 
have not been used because of their complexity and lack 
of validation [7]. Death, recurrent malignancy or a further 
change in systemic treatment other than the dose of pred- 
nisone before the assessment point should not qualify as 
a response. Measures of response will have to be simplified 
and validated in order to gain wider acceptance. For example, 
the NIH scales for mild (score 1), moderate (score 2) and 
severe (score 3) chronic GVHD could be used to standardize 
organ response criteria. The same scale could be used to 
standardize overall response criteria, although the appro- 
priate classification for cases with improvement in one organ 
with progression in another would still pose difficulty. 
Alternatively, changes measured according to the NIH scale 
for global severity of chronic GVHD could be used to stand- 
ardize overall response, since several studies have shown 
that the NIH global severity at initial diagnosis correlates 
with survival [102, 103], although changes in global severity 
have not yet been correlated with survival. Further evidence 
from retrospective or prospective studies will be needed to 
reach consensus on standards for assessment of treatment 
response in patients with chronic GVHD. 

Many studies have used short-term response to assess new 
therapies for chronic GVHD, but at least one prior study 
showed that response at 3 or 6 months does not predict 
resolution of the disease [104]. On the other hand, results 
of this study showed that several definitions of response 
could be used together with the additional criterion of a 
prednisone dose ^0.25 mg/kg/day to predict the risk of 
subsequent failure, defined as death, onset of bronchiolitis 
obliterans, or introduction of a new systemic treatment 
because of new or progressive manifestations of chronic 
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Fig. 3. Historical outcomes after secondary systemic therapy for 
chronic GVHD. The upper solid curve shows time to treatment failure 
defined as a qualitative change in systemic therapy or death during 
secondary therapy. The dashed curve shows time to treatment failure or 
recurrent malignancy during secondary therapy. The lower solid curve 
shows the cumulative incidence of discontinued systemic treatment 
after resolution of chronic GVHD. The dot on the dashed line indicates 
that approximately 40% of patients were alive at 1 year after the onset 
of secondary treatment without a qualitative change in systemic 
therapy and without recurrent malignancy. Chronic GVHD was defined 
according to historical criteria and might not reflect results to be 
expected for patients with chronic GVHD defined according to NIH 
criteria. The figure is adapted from reference [29]. "'During secondary 
therapy. 

GVHD. Hence, patients with response by the proposed 
composite definitions had lower risks of subsequent failure, 
as compared to those who did not have responses by the 
same definitions. 

Progress has been hampered by the absence of any 
established benchmark of success that could be used as a 
comparison point for studies of new treatment. We have 
previously reviewed outcomes after secondary systemic 
treatment for chronic GVHD at our center [29]. 
Approximately 50% of patients died or had a qualitative 
change in systemic therapy during the first year, and an 
additional 10% had recurrent malignancy (Fig. 3). The pro- 
portion of patients who were alive without a subsequent 
change in systemic therapy and without recurrent malig- 
nancy was approximately 60% at 6 months and 40% at 1 
year. The proportion of patients with complete or partial 
response at 1 year was not determined, but it cannot exceed 
40%. These results might not be representative of current 
outcomes, since historical criteria were used to define chronic 
GVHD. 



HIGHER STANDARDS FOR PUBLICATION 
- A PROPOSAL FOR THE FUTURE 



Editors of the Journal of Clinical Oncology have recognized 
the importance of improving the conduct and reporting of 
phase II trials testing treatments for cancer [105]. As a 
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guideline for authors, the editors have summarized criteria 
for the types of phase II studies that would be most 
appropriate for consideration by the journal. The editors 
also expressed the hope that their views might assist other 
journals that may be struggling to prioritize the most im- 
portant types of phase II trials for publication. 

The editorial position at the Journal of Clinical Oncology 
is that phase II studies will be considered only if they include 
1) a clear definition of the primary end point, 2) a hy- 
pothesized value of the primary end point that justified the 
planned sample size, and 3) a discussion of possible weak- 
nesses, such as any comparison to historical controls [106]. 
Only one of the 31 prospective studies of secondary treatment 
for chronic GVHD had a formal estimation of sample size 
with a well-justified historical benchmark. The Journal of 
Clinical Oncology also requires on-line publication of a 
redacted version of the study protocol, thereby enabling 
reviewers and readers to recognize any differences between 
the reported results and the study as originally planned. 
Improved treatment for chronic GVHD is more likely to 
emerge if reviewers and journal editors hold authors to higher 
standards in evaluating manuscripts for publication. 



CONCLUSION - THE FUTURE 



Standardization of methods for clinical trials would enable 
comparison of results in different trials and thereby accelerate 
progress in evaluation of new treatments for chronic GVHD. 
An urgent priority is the development of a benchmark of 
success based on results of unbiased retrospective reviews 
or prospective studies that include all patients who received 
secondary systemic therapy for chronic GVHD. Robust phase 
II studies could then be carried out to evaluate whether 
new therapies offer any genuine improvement compared 
to the benchmark. For example, a clinical trial could be 
designed to test whether a new treatment improves outcomes 
compared to the 40% historical response rate at 1 year, as 
described above. If the true response rate with the new 
treatment were 60%, enrollment of 42 patients would offer 
80% statistical power with a 0.05 one-side type I error, 
and successful outcomes in at least 22 patients would 
encourage further studies. Alternatively, randomized trials 
with a "pick-the-winner" design could be used to identify 
approaches that truly warrant further evaluation. 

Promising candidate treatments identified in robust phase 
II studies could be taken forward in phase III studies of 
secondary treatment, and successful results in such a study 
would establish a new benchmark for future phase II studies 
of secondary therapy. Promising candidate treatments could 
also be tested in phase II studies of primary treatment. Most 
importantly, successful results in phase III studies of either 
secondary or primary treatment would improve patient 
outcomes and establish new standards of care. 
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